-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(specs): implement sample method #116
Conversation
275fbc7
to
ce0445a
Compare
Hi @aar65537, apologies for the late response and thanks very much for the contribution! Had a chat with the team and we can't merge the PR as it is now. Problem is it's not guaranteed to generate valid observations with the sample method (this is what the generators are for) and while it will generate legal actions and thus it would be useful for e-greedy type exploration, we can't have it work for action specs but not for observation specs. Honestly I'm not sure if there is a clean solution for this problem and as such I'm going to close this PR, but if you do think of something please comment on this/open a new PR/issue. |
Thank you for the response @sash-a. In my mind, Perhaps, it would be best to include |
I'm still not 100% convinced by this. I agree that it seems like a nice feature, however the useful part of it is to generate actions for e-greedy like exploration (I don't see another use as we have the I do get that generating illegal states is acceptable behaviour as I think this is how gym specs work so being consistent with those isn't a bad idea, even though the observations would be technically invalid. This was mostly @clement-bonnet's issues with the PR so would be good to get your views on this 😁 |
e-greedy exploration is probably the main use case. However, a |
The vmapping becomes slightly tricky because there is nothing to vmap over as the sample method takes no arguments. I'll try ping @clement-bonnet and see if he has any further opinions 😄 |
Closes #103